β ˆ j, and the SD path uses the local gradient

Proceeings of the 00 Winter Simulation Conference E. Yücesan, C.-H. Chen, J. L. Snowon, an J. M. Charnes, es. RESPONSE SURFACE METHODOLOGY REVISITED Ebru Angün Jack P.C. Kleijnen Department of Information Management/Center for Economic Research (CentER) School of Economics an Business Aministration Tilburg University 5000 LE Tilburg, THE NETHERLANDS Dick Den Hertog Gül Gürkan Department of Econometrics an Operations Research /Center for Economic Research (CentER) School of Economics an Business Aministration Tilburg University 5000 LE Tilburg, THE NETHERLANDS ABSTRACT Response Surface Methoology (RSM) searches for the input combination that optimizes the simulation output. RSM treats the simulation moel as a black box. Moreover, this paper assumes that simulation requires much computer time. In the first stages of its search, RSM locally fits first-orer polynomials. Next, classic RSM uses steepest escent (SD); unfortunately, SD is scale epenent. Therefore, Part 1 of this paper erives scale inepenent aapte SD (ASD) accounting for covariances between components of the local graient. Monte Carlo experiments show that ASD inee gives a better search irection than SD. Part consiers multiple outputs, optimizing a stochastic objective function uner stochastic an eterministic constraints. This part uses interior point methos an binary search, to erive a scale inepenent search irection an several step sizes in that irection. Monte Carlo examples emonstrate that a neighborhoo of the true optimum can inee be reache, in a few simulation runs. 1 INTRODUCTION RSM was invente by Box an Wilson (1951) for fining the input combination that minimizes the output of a real, non-simulate system. They ignore constraints. Also see recent publications such as Box (1999), Khuri an Cornell (1996), Myers (1999), an Myers an Montgomery (1995). Later on, RSM was also applie to ranom simulation moels, treating these moels as black boxes (a black box means that there is no graient information available; see Spall (1999)). Classic articles are Donohue, Houck, an Myers (1993, 1995); recent publications are Irizarry, Wilson, an Trevino (001), Kleijnen (1998), Law an Kelton (000, pp. 646-655), Neermeijer et al. (000), an Safizaeh (00). Technically, RSM is a stagewise heuristic that searches through various local (sub)areas of the global area in which the simulation moel is vali. We focus on the first stage, which fits first-orer polynomials in the inputs, per local area. This fitting uses Orinary Least Squares (OLS) an estimates the SD path, as follows. Let j enote the value of the original (nonstanarize) input j with j = 1,..., k. Hence k main or first-orer effects (say) β j are to be estimate in the local first-orer polynomial approximation. For this estimation, classic RSM uses resolution-3 esigns, which specify the n k +1 input combinations to be simulate. (Spall (1999) proposes to simulate only two combinations in his simultaneous perturbation stochastic approximation or SPSA.) These input/output (I/O) combinations give the OLS estimates β ˆ j, an the SD path uses the local graient ( ˆ1,..., ˆ β β k ). Unfortunately, RSM suffers from two well-known problems; see Myers an Montgomery (1995, pp. 19-194): (i) SD is scale epenent; (ii) the step size along the SD path is selecte intuitively. For example, in a case stuy, Kleijnen (1993) uses a step size that oubles the most important input. Our research contribution is the following. In Part 1 ( -3) we erive ASD; that is, we ajust the estimate first-orer factor effects through their estimate covariance matrix. We prove that ASD is scale inepenent. In most of our Monte Carlo experiments with simple test functions, ASD inee gives a better search irection. Note that we examine only the search irection, not the other elements of classic RSM. In Part ( 4-5) we consier multiple outputs, whereas classic RSM assumes a single output. We optimize a stochastic objective function uner multiple stochastic an eterministic constraints. We erive a scale inepenent search irection inspire by interior point

methos - an several step sizes - inspire by binary search. This search irection namely scale an projecte SD, is a generalization of classic RSM s SD. We then combine these search irection an step sizes into an iterative heuristic. Notice that if there are no bining constraints at the optimum, then classic RSM combine with ASD might suffice. The remainer of this paper is organize as follows. For the unconstraine problem, erives ASD, its mathematical properties, an its interpretation. 3 compares the search irections SD an ASD, by means of Monte Carlo experiments. 4 erives a novel heuristic combining a search irection an a step size proceure for constraine problems. 5 stuies the performance of the novel heuristic by means of Monte Carlo experiments. 6 gives conclusions. Note that this paper summarizes two separate papers, namely Kleijnen, Den Hertog, an Angün (00), an Angün et al. (00), which give all mathematical proofs an aitional experimental results. ADAPTED STEEPEST DESCENT RSM uses the following approximation: k y = β0 + β j j + e (1) j =1 where y enotes the preictor of the expecte simulation output; e enotes the noise consisting of intrinsic noise cause by the simulation s pseuo-ranom numbers (PRN) plus lack of fit. RSM assumes white noise; that is, e is normally, ientically, an inepenently istribute with zero mean µ e an constant variance σ e. The OLS estimator of the q = k + 1 parameters (..., ) β = β 0, β k in (1) is with β ˆ -1 = ( X X ) X w () X : N q matrix of explanatory variables incluing the ummy variable with constant value 1; X is assume to have linearly inepenent columns N = i n = 1 mi : number of simulation runs m i : number of replicates at input combination i, with m i N mi > 0 n : number of ifferent, simulate input combinations with n N n q w : vector with N simulation outputs w i; r ( r = 1,...,mi ). The noise in (1) may be estimate through the mean square resiual (MSR): n mi ( ˆ wi; r - yi ) 1 1 σˆ = i= r= e ( N - q) where yˆ i follows from (1) an (): yˆ k i = βˆ + βˆ j i; j. 0 j = 1 Kleijnen et al. (00) erives the esign point that minimizes the variance of the regression preictor: 1 0 = C b where σ e C is the covariance matrix of βˆ -0 which equals βˆ excluing the intercept β : ˆ0 (3) a b cov = σe = σe ( βˆ) ( X X ) -1 (4) b C where a is a scalar, b a k -imensional vector, an C a k k matrix. Now we consier the one-sie 1 α confience interval ranging from to ˆ x x βˆ -1 y ( ) = α max +t N - q σˆ e x ( X X ) x (5) 1, an t N α q enotes the 1 α of the t istribution with N q egrees of freeom. where x = ( ) + ASD selects, the esign point that minimizes the maximum output preicte through (5) (this gives both a search irection an a step size). Kleijnen et al. (00) erives + = C -1 b - λc -1 ˆβ -0 (6a) where C 1 b is erive from (4), -1 C β ˆ is the ASD -0 irection, an λ is the step size: λ= ( α t N - q a - b -1 C b. ˆ ) ˆ -1 σe - β ˆ -0 C β-0 (6b) We mention the following mathematical properties an interpretations of ASD. The first term in (6a) means that the ASD path starts from the point with minimal preictor variance. The secon

term means that the classic SD irection βˆ (secon -0 term s last factor) is ajuste for the covariance matrix of βˆ ; see C in (4). The step size λ is quantifie in (6b). -0 Kleijnen et al. (00) proves that ASD is scale inepenent. In case of large signal/noise ratios βˆ / var( βˆ ), the j j enominator uner the square root in (6b) is negative so (6) oes not give a finite solution for +. Inee, if the noise is negligible, we have a eterministic problem, which our technique is not meant to aress (many other researchers - incluing Conn et al. (000) - stuy optimization of eterministic simulation moels). In case of a small signal/noise ratio, no step is taken. Kleijnen et al (00) further iscusses two subcases: (i) the signal is small; (ii) the noise is big. 3 COMPARISON OF ASD AND SD THROUGH MONTE CARLO EXPERIMENTS To compare the ASD an SD search irections, we perform Monte Carlo experiments. The Monte Carlo metho is an efficient an effective way to estimate the behavior of search techniques applie to ranom simulations; see Kleijnen et al. (00). We limit the example to two inputs, so k =. We generate the simulation output w through a secon-orer polynomial with white noise: w= β + β + β + β + β 0 1 1 1;1 1 ; + β1; 1 + e. The response surface (7) hols for the global area, for which we take the unit square: 1 1 1 an 1 1. (We have alreay seen that RSM fits firstorer polynomials locally.) In the local area we use a one-at-a-time esign, because this esign is non-orthogonal - an in practice esigns in the original inputs are not orthogonal (see Kleijnen et al. (00)). The specific local area is in the lower corner of Figure 1. To enable the computation of the MSR, we simulate one input combination twice: m1 =. We consier a specific case of (7): β 0 = β 1 = β = 0, β 1; =, β 1; 1 = -, β ; = -1, so the contour functions (for example, iso-cost curves) form ellipsois tilte relative to the 1 an axes. Hence, (7) has as its true optimum = ( 0, 0). After fitting a first-orer polynomial, we estimate the SD an ASD paths starting from 0 = (0.85, -0.95), explaine above (4). (7) Figure 1: Tilte Ellipsoi Contours E ( w, ) 1 with Global an Local Experimental Areas In this Monte Carlo experiment we know the truly optimal search irection, namely the vector (say) g that an ens at the true optimum ( ) starts at 0 0, 0. So we compute the angle (say) θˆ between the true search irection g an the estimate search irection p : ˆ g p θ = arccos g. p (8) Obviously, the smaller θˆ is, the better the search technique performs. To estimate the istribution of θˆ efine in (8), we take 100 macro-replications. Figure shows a bunle of 100 p ' s aroun g when σ e = 0.10. In each macroreplicate, we apply SD an ASD to the same I/O ata ( w, 1, ). We characterize the resulting empirical θˆ istribution through several statistics, namely its average, stanar eviation, an specific s; see Table 1. Figure : 100 ASD Search Directions p an the Truly Optimal Search Direction g (Marke by Thick Dots)

Table 1: Statistics in case of Interactions, for ASD an SD s Estimate Angle Error θˆ (in Degrees) Statistics σ e = 0.10 σ e = 0.5 ASD SD ASD SD Average 9.7 16.01 10.14 17.33 Stanar eviation 3.30 6.3 7.69 1.88 Meian (50% ) 9.68 16.0 8.99 14.94 75% 1.37 1.1 16.13 7.87 5% 6.99 10.76 3.1 5.84 95% 15.66 7.05 4.78 41.55 5% 4.99 6.80 0.61 0.81 100% 17.41 30.08 3.07 50.99 0% 0.85 1.46 0.04 0.5 Further, we perform the Monte Carlo experiment for two noise values: σ e is 0.10 or 0.5. We use the same PRN for both values. In case of high noise, the estimate search irections may be very wrong. Nevertheless, ASD still performs better; see again Table 1. In general, ASD performs better than SD, unless we focus on outliers; see Kleijnen et al. (00). 4 MULTIPLE RESPONSES: INTERIOR POINT AND BINARY SEARCH APPROACH In Part 1, we assume a single response of interest enote by w in (). Now we consier a more realistic situation, namely the simulation generates multiple outputs. For example, an acaemic inventory simulation efines w in () as the sum of inventory-carrying, orering, an out-of-stock costs, whereas a practical simulation minimizes the sum of the expecte inventory-carrying an orering costs provie the service probability excees a pre-specifie value. In RSM, there have been several approaches to multiresponse optimization. Khuri (1996) surveys most of these approaches (incluing esirability functions, generalize istances, an ual responses). Angün et al. (00) iscusses rawbacks of these approaches. To overcome these rawbacks, we propose the following alternative base on mathematical programming. We select one of the responses as the objective an the remaining (say) z 1 responses as constraints. The SD search woul soon hit the bounary of the feasible area forme by these constraints, an woul then creep along this bounary. Instea, our search starts in the interior of the feasible area an avois the bounary; see Barnes (1986) on Karmarkar s algorithm for linear programming. Note that our approach has the aitional avantage of avoiing areas in which the simulation moel is not vali an may even crash. Formally, our problem becomes: minimize E( w 0 ( ) subject to E( w ( ) a for h = 1,..., z 1 h h (9) l u where is the vector of simulation inputs, l an u the eterministic lower an upper bouns on, a h the righthan-sie value for the h th constraint, an w h ( h = 0,..., z 1) is the response h. Note that probabilities (for example, service percentages) can be formulate as expecte values of inicator functions. Further, the multiple simulation responses are correlate, since they are estimate through the same PRN fe into the same simulation moel. As in Part 1, we locally fit a first-orer polynomial but now for each response; see (1). However, the noise e is now multi-variate normal still with zero means but now with covariance matrix (say) Σ. Yet, since the same esign is use for all z responses, the GLS estimator reuces to the OLS estimator; see Ruu (000, p. 703). Therefore we still use (), but to the symbol βˆ we a the subscript h. Further, for h, h = 0,, z 1 we estimate Σ through the analogue of (3): ( ˆ ) ( ˆ ) ˆ wh yh wh yh σh,h =. Ν (k + 1) We introuce = ( b1,..., bz 1 ), ( 1,..., z -1) (10) B where b h h = enotes the vector of OLS estimates βˆ -0; h (excluing the intercept βˆ0; h ) for the h th response. Aing slack vectors s, r, an v, we obtain minimize b0 subject to B s = c, + r = u, v = l, (11) s, r, v 0 enotes the vector of OLS estimates βˆ (ex- -0; 0 where b0 cluing the intercept βˆ ) for w, -0; 0 0 an c is the vector with components ch = ah β ˆ0; h ( h = 1,..., z -1). Through (11) we obtain a local linear approximation for (9). Then using ieas from interior point methos - more specifically the affine scaling metho - Angün et al. (00) erives the following search irection: ' 1 p = ( B S B + R + V ) b0 (1)

where S, R, an V are iagonal matrices with the current estimate slack vectors s, r,v > 0 on the iagonal. Obviously, R an V in (1) are known as soon as the eterministic input to the simulation is selecte; S in (1) is estimate from the simulation output, not from the local approximation. Unlike SD, the search irection (1) is scale inepenent (the inverse of the matrix within the parentheses in (1) scales an projects the estimate SD irection, b0 ). Having estimate a search irection for a specific starting point through (1), we must next select a step size. Actually, we run the simulation moel for several step sizes in that irection, as follows. First, we compute the maximum step size assuming that the local approximation (11) hols globally: where λ max = max{0,min{ λ1, λ, λ3}} λ1 = min{( ch bh ) / p bh : h {1,..., z 1}, p bh < 0} λ = min{( u j j )/ p j : j {1,, k}, p j > 0} λ3 = min{( l j j )/ p j : j {1,, k}, p j < 0}. To increase the probability of staying within the interior of the feasible region, we take only 80% of λ max as our maximum step size. The subsequent step sizes are inspire by binary search, as follows. We systematically halve the current step size along the search irection. At each step, we select as the best point the one with the minimum value for the simulation objective w 0, provie it is feasible. We stop the search in a particular irection after a user-specifie number of iterations (say) G = 3. For etails see Angün et al. (00). For all these steps we use common ranom numbers (CRN), in orer to better test whether the objective improves. Moreover, we test whether the other z 1 responses remain within the feasible area: we test the slack vector s, introuce in (11). These z tests use ratios instea of absolute ifferences, to avoi scale epenence. A statistical complication is that these ratios may not have finite moments. Therefore we test their meians (not their means). For these tests we use Monte Carlo sampling, which takes negligible computer time compare with the expensive simulation runs. This Monte Carlo takes (say) K = 1000 samples from the assume istributions with means an variances estimate through the simulation; in the numerical example of the next section we assume z normal istributions ignoring correlations between these responses. From these Monte Carlo samples we compute slack ratios. We formulate pessimistic null-hypotheses; that is, we accept an input combination only if it gives a significantly lower objective value an all its z 1 slacks imply a feasible solution. Actually, our interior point metho implies that a new slack value is a percentage say, 0% - of the ol slack value; our pessimistic hypotheses make our acceptable area smaller than the original feasible area in (9). After we have run G simulations along the search path, we fin a best solution - so far. Now we wish to reestimate the search irection p efine in (1). Therefore we again use a resolution-3 esign in the k factors. We still use CRN; actually, we take the same sees as we use for the very first local exploration. We save one (expensive) run by using the best combination foun so far, as one of the combinations for the esign. We stop the whole search when either the computer buget is exhauste or the search returns to an ol combination twice. When the search returns to an ol combination for the first time, we use a new set of sees. 5 MONTE CARLO EXPERIMENTS FOR MULTIPLE RSM Like in Part 1 ( 3), we stuy the novel proceure - explaine in 4 - by means of a Monte Carlo example. As in 3, we assume globally vali functions quaratic in two inputs, but now we consier three responses; moreover, we a eterministic box constraints for the two inputs: minimize E( 5 ( 1 1) + ( 5) + 41 + e0 ) subject to E ( 1 3) + + 1 + e1 ) 4 ( + 3( + 1.061) + e ) 9 E 1 0 1 3, 1 where the noise has σ 0 = 1, σ 1 = 0.15, σ = 0.4, an correlations ρ 0; 1 = 0.6, ρ 0; = 0.3, ρ 1; = 0.1. It is easy to erive the analytical solution as = ( 1.4, 0.5) with a mean objective value of.96 approximately. We select the initial local area shown in the lower left corner of Figure 3. We run 100 macro-replicates; Figure 3 isplays the macro-replicate that gives the meian result for the objective; that is, 50% of the macro-replicates have worse objective values. In this figure we have ˆ = ( 1.46, 0.49) an an estimate objective of 5.30 approximately. Table summarizes the 100 macro-replicates, where Criterion 1 is the relative expecte objective ( E( w (, ).96)/.96 ; Criteria an 3 stan for 0 1

Figure 3: The Average (50 th Quantile) of 100 Estimate Solutions Table : Estimate Objective an Slacks over 100 Macro-replicates Criterion 1 Criterion Criterion 3 10 th 5 th 50 th 75 th 90 th 0.04 0.03 0.03 0.06 0.1 0.15 0.10 0.5 0.9 0.19 0.43 0.49 0.18 0.61 0.50 the relative expecte slacks ( 4 E( w 1( 1, )/ 4 ( 9 E( w (, )/ 9 1 an for the first an the secon constraints. Ieally, Criterion 1 is zero; Criteria an 3 are zero if the constraints are bining at the optimum. Our heuristic tens to en at a feasible combination: the table isplays only positive s for the Criteria an 3. This feasibility is explaine by our pessimistic null hypotheses (an our small significance level α = 0.01). Our conclusion is that the heuristic reaches the esire neighborhoo of the real optimum in a relatively small number of simulation runs. Once the heuristic reaches this neighborhoo, it usually stops at a feasible point. 6 CONCLUSIONS In Part 1 of this paper we aresse the problem of searching for the simulation input combination that minimizes the output. RSM is a classic technique for tackling this problem, but it uses SD, which is scale epenent. Therefore we evise aapte SD (ASD), which corrects for the covariances of the estimate graient components. ASD is scale inepenent. Our Monte Carlo experiments emonstrate that - in general - ASD gives a better search irection than SD. In Part, we account for multiple simulation responses. We use a mathematical programming approach; that is, we minimize one ranom objective uner ranom an eterministic constraints. As in classic RSM, we locally fit functions linear in the simulation inputs. Next we apply interior point techniques to these local approximations, to estimate a search irection. This irection is scale inepenent. We take several steps into this irection, using binary search an statistical tests. Then we re- estimate these local linear functions, etc. Our Monte Carlo experiments emonstrate that our metho inee approaches the true optimum, in relatively few runs with the (expensive) simulation moel. REFERENCES Angün, E., D. Den Hertog, G. Gürkan, an J. P. C. Kleijnen. 00. Constraine response surface methoology for simulation with multiple responses. CentER Working Paper. Barnes, E. R. 1986. A variation on Karmarkar s algorithm for solving linear programming problems. Mathematical Programming 36: 174-18. Box, G. E. P. 1999. Statistics as a catalyst to learning by scientific metho, part II - a iscussion. Journal of Quality Technology 31 (1): 16-9. Box, G. E. P. an K. B. Wilson. 1951. On the experimental attainment of optimum conitions. Journal of Royal Statistical Society, Series B 13 (1): 1-38. Conn, A. R., N. Goul, an Ph. L. Toint. 000. Trust Region Methos. Philaelphia: SIAM. Donohue, J. M., E. C. Houck, an R. H. Myers. 1993. Simulation esigns an correlation inuction for reucing orer bias in first-orer response surfaces. Operations Research 41 (5): 880-90. --- 1995. Simulation esigns for the estimation of response surface graients in the presence of moel misspecification. Management Science 41 (): 44-6. Irizarry, M., J. R. Wilson, an J. Trevino. 001. A flexible simulation tool for manufacturing-cell esign, II: response surface analysis an case stuy. IIE Transactions 33: 837-846. Khuri, A. I. 1996. Multiresponse surface methoology. In Hanbook of Statistics, e. S. Ghosh an C. R. Rao, 377-406. Amsteram: Elsevier. Khuri, A. I. an J. A. Cornell. 1996. Response Surfaces: Designs an Analyses. e. New York: Marcel Dekker.

Kleijnen, J. P. C. 1993. Simulation an optimization in prouction planning: a case stuy. Decision Support Systems 9: 69-80. --- 1998. Experimental esign for sensitivity analysis, optimization, an valiation of simulation moels. In Hanbook of Simulation, e. J. Banks, 173-3. New York: John Wiley & Sons. Kleijnen, J. P. C., D. Den Hertog, an E. Angün. 00. Response surface methoology s steepest ascent an step size revisite. CentER Working Paper. Law, A. M. an W. D. Kelton. 000. Simulation Moeling an Analysis. 3 e. Boston: McGraw-Hill. Myers, R. H. 1999. Response surface methoology: current status an future irections. Journal of Quality Technology 31 (1): 30-74. Myers, R. H. an D. C. Montgomery. 1995. Response Surface Methoology: Process an Prouct Optimization Using Designe Experiments. New York: John Wiley & Sons. Neermeijer, H. G., G. J. van Ootmarsum, N. Piersma, an R. Dekker. 000. A framework for response surface methoology for simulation optimization moels. In Proceeings of the 000 Winter Simulation Conference, e. J. A. Joines, R. R. Barton, K. Kang, an P. A. Fishwick, 19-136. Piscataway, New Jersey: Institute of Electrical an Electronics Engineers. Ruu, P. A. 000. An Introuction to Classical Econometric Theory. New York: Oxfor University Press. Safizaeh, M. H. 00. Minimizing the bias an variance of the graient estimate in RSM simulation stuies. European Journal of Operational Research 136 (1): 11-135. Spall, J. C. 1999. Stochastic optimization an the simultaneous perturbation metho. In Proceeings of the 1999 Winter Simulation Conference, e. P. A. Farrington, H. B. Nembhar, D. T. Sturrock, an G. W. Ewans, 101-109. Piscataway, New Jersey: Institute of Electrical an Electronics Engineers. receive a number of international fellowships an awars. His e-mail an web aress are:<kleijnen@uvt.nl> an <http://www.tilburguniversity.nl/ faculties/few/im/ staff/kleijnen/>. DICK DEN HERTOG is a Professor of Operations Research an Management Science at the Center for Economic Research (CentER), within the Faculty of Economics an Business Aministration at Tilburg University, in the Netherlans. He receive his Ph.D. (cum laue) in 199. From 199 until 1999 he was a consultant for optimization at CQM in Einhoven. His research concerns eterministic an stochastic simulation-base optimization an nonlinear programming, with applications in logistics an prouction. His e-mail an web aress are: <D.enHertog@uvt.nl> an <http://cwis. kub.nl/~few5/center/staff/hertog>. GÜL GÜRKAN is an Associate Professor of Operations Research an Management Science at the Center for Economic Research (CentER), within the Faculty of Economics an Business Aministration at Tilburg University, in the Netherlans. She receive her Ph.D. in Inustrial Engineering from the University of Wisconsin- Maison in 1996. Her research interests inclue simulation, mathematical programming, stochastic optimization, an equilibrium moels with applications in logistics, prouction, telecommunications, economics, an finance. She is a member of INFORMS. Her e-mail an web aress are: <ggurkan@uvt.nl> an <http://infolab. kub.nl/people/ggurkan/> AUTHOR BIOGRAPHIES EBRU ANGÜN is a Ph.D. stuent at the Department of Information Management at Tilburg University in the Netherlans since 000. Her e-mail aress is <M.E.Angun@uvt.nl >. JACK P.C. KLEIJNEN is a Professor of Simulation an Information Systems. His research concerns simulation, mathematical statistics, information systems, an logistics; this research resulte in six books an nearly 00 articles. He has been a consultant for several organizations in the USA an Europe, an has serve on many international eitorial boars an scientific committees. He spent several years in the USA, at both universities an companies, an